Accelerating a PARSEC Benchmark Using Portable Subword SIMD
نویسندگان
چکیده
We present a case study of the GNU Compiler Collection (GCC) Vector Extensions in GCC 4.7. In particular, we examine the relative performance of explicit vector code using the GCC Vector Extensions to that of automatically vectorized code from the Intel C++ Compiler (ICC). Our analysis focuses on the interactions between data-level and thread-level parallelism in the streamcluster benchmark from the PARSEC benchmark suite, in particular examining tradeoffs between portability and performance across different vectorization techniques.
منابع مشابه
Accelerating multimedia with enhanced microprocessors
A minimalistic set of multimedia instructions introduced into PA-RISC microprocessors implements SIMD-MIMD parallelism with insignificant changes to the underlying microprocessor. Thus, a software video decoder attains MPEG video and audio decompression and playback at real-time rates of 30 frames per second, on an entry-level workstation. Our general-purpose parallel subword hxstructions can a...
متن کاملHigh-performance and Energy-efficient Heterogeneous Subword Parallel Instructions
High instruction throughput and energy efficiency are becoming increasingly important design requirements for embedded and mobile computing systems. This paper presents tlie Quantized Color Pack extension (QCPX) ISA to improve execution performance of multimedia processing applications on programmable superscalar processors while reducing the energy consumption for these applications. QCPX expl...
متن کاملA Characterization of the PARSEC Benchmark Suite for CMP Design
The shared-memory, multi-threaded PARSEC benchmark suite is intended to represent emerging software workloads for future systems. It is specifically intended for use by both industry and academia as a tool for testing new Chip Multiprocessor (CMP) designs. We analyze the suite in detail and identify bottlenecks using hardware performance counters. We take a systems-level approach, with an empha...
متن کاملModeling the Effects on Power and Performance from Memory Interference of Co-located Applications in Multicore Systems
In this study, we analyze interference trends when corunning multiple applications possessing varying degrees of memory intensity on multi-core processors. We conduct tests with PARSEC benchmark applications and explore energy consumption, execution times, and main memory accesses when interfering applications share last-level cache. We also explore how co-running applications are impacted when...
متن کاملPARSEC Benchmark Suite: A Parallel Implementation on GPU using CUDA
Graphics Processing Units (GPUs) are a class of specialized parallel architectures with tremendous computational power. The Compute Unified Device Architecture (CUDA) programming model from NVIDIA facilitates programming of general purpose applications on their GPUs. In this project, we targets Parsec benchmarks to provide orders of performance speed up and reducing overall execution time on mu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011